Tight Bounds for Hashing Block Sources

نویسندگان

  • Kai-Min Chung
  • Salil P. Vadhan
چکیده

It is known that if a 2-universal hash function H is applied to elements of a block source (X1, . . . , XT ), where each item Xi has enough min-entropy conditioned on the previous items, then the output distribution (H,H(X1), . . . , H(XT )) will be “close” to the uniform distribution. We provide improved bounds on how much min-entropy per item is required for this to hold, both when we ask that the output be close to uniform in statistical distance and when we only ask that it be statistically close to a distribution with small collision probability. In both cases, we reduce the dependence of the min-entropy on the number T of items from 2 logT in previous work to logT , which we show to be optimal. This leads to corresponding improvements to the recent results of Mitzenmacher and Vadhan (SODA ‘08) on the analysis of hashing-based algorithms and data structures when the data items come from a block source.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Tree Model for Hashing: Lower and Upper Bounds

We define a new simple and general model for hashing. The basic model, together with several variants capture many natural (sequential and parallel) hashing algorithms and represent common hashing practice. Our main results exhibit tight tradeoffs between hash table size and the number of applications of a hash function on a single key.

متن کامل

On Finite Block-Length Quantization Distortion

We investigate the upper and lower bounds on the quantization distortions for independent and identically distributed sources in the finite block-length regime. Based on the convex optimization framework of the rate-distortion theory, we derive a lower bound on the quantization distortion under finite block-length, which is shown to be greater than the asymptotic distortion given by the ratedis...

متن کامل

Patterns of i.i.d. Sequences and Their Entropy - Part II: Bounds for Some Distributions

A pattern of a sequence is a sequence of integer indices with each index describing the order of first occurrence of the respective symbol in the original sequence. In a recent paper, tight general bounds on the block entropy of patterns of sequences generated by independent and identically distributed (i.i.d.) sources were derived. In this paper, precise approximations are provided for the pat...

متن کامل

Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

We prove a tight lower bound for the exponent ρ for data-dependent LocalitySensitive Hashing schemes, recently used to design efficient solutions for the c-approximate nearest neighbor search. In particular, our lower bound matches the bound of ρ ≤ 1 2c−1+o(1) for the l1 space, obtained via the recent algorithm from [Andoni-Razenshteyn, STOC’15]. In recent years it emerged that data-dependent h...

متن کامل

Patterns of i.i.d. Sequences and Their Entropy - Part I: General Bounds

Tight bounds on the block entropy of patterns of sequences generated by independent and identically distributed (i.i.d.) sources are derived. A pattern of a sequence is a sequence of integer indices with each index representing the order of first occurrence of the respective symbol in the original sequence. Since a pattern is the result of data processing on the original sequence, its entropy c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008